narrative planning problem
Can LLMs Generate Good Stories? Insights and Challenges from a Narrative Planning Perspective
Story generation has been a prominent application of Large Language Models (LLMs). However, understanding LLMs' ability to produce high-quality stories remains limited due to challenges in automatic evaluation methods and the high cost and subjectivity of manual evaluation. Computational narratology offers valuable insights into what constitutes a good story, which has been applied in the symbolic narrative planning approach to story generation. This work aims to deepen the understanding of LLMs' story generation capabilities by using them to solve narrative planning problems. We present a benchmark for evaluating LLMs on narrative planning based on literature examples, focusing on causal soundness, character intentionality, and dramatic conflict. Our experiments show that GPT-4 tier LLMs can generate causally sound stories at small scales, but planning with character intentionality and dramatic conflict remains challenging, requiring LLMs trained with reinforcement learning for complex reasoning. The results offer insights on the scale of stories that LLMs can generate while maintaining quality from different aspects. Our findings also highlight interesting problem solving behaviors and shed lights on challenges and considerations for applying LLM narrative planning in game environments.
Ware
Glaive is a state-space planner based on Hoffmann and Nebel's Fast-Forward which solves the narrative planning problem defined by Riedl and Young -- to construct a plan which achieves the author's goals out of steps which are clearly motivated and goal-oriented toward individual character goals. Glaive reasons about how characters cooperate and conflict based on causal structures and possible worlds. By leveraging the unique constraints of narrative planning, Glaive reduces its branching factor and calculates a more accurate heuristic. We evaluate it on 8 narrative planning problems and demonstrate that it can solve certain non-trivial problems in under 1 second.